A Study of Loop Unrolling for VLIW-based DSP Processors
نویسندگان
چکیده
With the growing popularity of DSPs and their associated applications, cost-effective software development has become a major issue. High-level language compilers are becoming more commonplace in the DSP world. While these compilers can generate correct code for DSP architectures, there remains considerable room for performance improvements. This paper addresses issues related to DSP compilation, focusing specifically on unrolling techniques proposed for VLIW-based DSP architectures.
منابع مشابه
Global Trade-o between Code Size and Performance for Loop Unrolling on VLIW Architectures
Many media processors 28, 7, 14, 8, 18, 27], used for computing intensive embedded applications, are VLIW architectures that rely on the compiler to exploit Instruction Level Parallelism. Loop unrolling is generally used to expose instruction parallelism but computing the unrolling factor is very diicult as instruction cache misses and spill code can cancel the expected beneet of the transforma...
متن کاملUFC : a Global Trade - o Strategy for Loop Unrolling for VLIWArchitectureK
In order to minimize code size overhead on VLIW ar-chitectures, compilers for embedded processors have to pay higher attention on code optimization than on compilation time. Thus, the rst demand on compiler for embedded processors consists in spending instruction memory space for optimization only if the associated performance improvement justiies it. In this paper, we propose a novel method ba...
متن کاملImplementing Click IP Router Kernel on VLIW Architectures
In this work, we implemented the Click IP Router Kernel in C language provided by Scott Webber et al. for two VLIW processors designed for DSP purpose, namely the Philips Trimedia TM1300 processor and Texas Instrument TMS320C6701 processor. The performance of these processors are compared with those of three other processors, ARM SA-110, HPL-PD EPIC, and Intel IXP1200 [1]. Ways of further perfo...
متن کاملOptimization of SAD Algorithm on VLIW DSP
SAD (Sum of Absolute Difference) algorithm is heavily used in motion estimation which is computationally highly demanding process in motion picture encoding. To enhance the performance of motion picture encoding on a VLIW processor, an efficient implementation of SAD algorithm on the VLIW processor is essential. SAD algorithm is programmed as a nested loop with a conditional branch. In VLIW pro...
متن کاملAssembly Code Conversion of Software-Pipelined Loop between two VLIW DSP Processors
In order to fully utilize the instruction level parallelism of VLIW DSP processors, DSP programs have to be optimized by software pipelining. Software pipelining has been studied for many years and widely implemented in optimizing compilers. However, due to the rearrangement of the original instructions, it is often very difficult to re-use or port the code of a software-pipelined loop to other...
متن کامل